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Abstract 


This document extends RFC 1958 by outlining some of the philosophical 
guidelines to which architects and designers of Internet backbone 


networks should adhere. 


We describe the Simplicity Principle, which 


states that complexity is the primary mechanism that impedes 


efficient scaling, 


and discuss its implications on the architecture, 


design and engineering issues found in large scale Internet 
backbones. 
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1. Introduction 


RFC 1958 [RFC1958] describes the underlying principles of the 
Internet architecture. This note extends that work by outlining some 
of the philosophical guidelines to which architects and designers of 
Internet backbone networks should adhere. While many of the areas 
outlined in this document may be controversial, the unifying 
principle described here, controlling complexity as a mechanism to 
control costs and reliability, should not be. Complexity in carrier 
networks can derive from many sources. However, as stated in 
[DOYLE2002], "Complexity in most systems is driven by the need for 
robustness to uncertainty in their environments and component parts 
far more than by basic functionality". The major thrust of this 
document, then, is to raise awareness about the complexity of some of 
our current architectures, and to examine the effect such complexity 
will almost certainly have on the IP carrier industry’s ability to 
succeed. 


The rest of this document is organized as follows: The first section 
describes the Simplicity Principle and its implications for the 
design of very large systems. The remainder of the document outlines 
the high-level consequences of the Simplicity Principle and how it 
should guide large scale network architecture and design approaches. 
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2. Large Systems and The Simplicity Principle 


The Simplicity Principle, which was perhaps first articulated by Mike 
O’Dell, former Chief Architect at UUNET, states that complexity is 
the primary mechanism which impedes efficient scaling, and as a 
result is the primary driver of increases in both capital 
expenditures (CAPEX) and operational expenditures (OPEX). The 
implication for carrier IP networks then, is that to be successful we 
must drive our architectures and designs toward the simplest possible 
solutions. 


2.1. The End-to-End Argument and Simplicity 


The end-to-end argument, which is described in [SALTZER] (as well as 
in RFC 1958 [RFC1958]), contends that "end-to-end protocol design 

should not rely on the maintenance of state (i.e., information about 
the state of the end-to-end communication) inside the network. Such 
state should be maintained only in the end points, in such a way that 
the state can only be destroyed when the end point itself breaks." 

This property has also been related to Clark’s "fate-sharing" concept 


[CLARK]. We can see that the end-to-end principle leads directly to 
the Simplicity Principle by examining the so-called "hourglass" 
formulation of the Internet architecture [WILLINGER2002]. In this 


model, the thin waist of the hourglass is envisioned as the 
(minimalist) IP layer, and any additional complexity is added above 
the IP layer. In short, the complexity of the Internet belongs at 
the edges, and the IP layer of the Internet should remain as simple 
as possible. 


Finally, note that the End-to-End Argument does not imply that the 
core of the Internet will not contain and maintain state. In fact, a 
huge amount coarse grained state is maintained in the Internet’s core 
(e.g., routing state). However, the important point here is that 
this (coarse grained) state is almost orthogonal to the state 
maintained by the end-points (e.g., hosts). It is this minimization 
of interaction that contributes to simplicity. As a result, 
consideration of "core vs. end-point" state interaction is crucial 
when analyzing protocols such as Network Address Translation (NAT), 
which reduce the transparency between network and hosts. 


2.2. Non-linearity and Network Complexity 


Complex architectures and designs have been (and continue to be) 
among the most significant and challenging barriers to building cost- 
effective large scale IP networks. Consider, for example, the task 
of building a large scale packet network. Industry experience has 
shown that building such a network is a different activity (and hence 
requires a different skill set) than building a small to medium scale 
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network, and as such doesn’t have the same properties. In 
particular, the largest networks exhibit, both in theory and in 
practice, architecture, design, and engineering non-linearities which 
are not exhibited at smaller scale. We call this Architecture, 
Design, and Engineering (ADE) non-linearity. That is, systems such 
as the Internet could be described as highly self-dissimilar, with 
extremely different scales and levels of abstraction [CARLSON]. The 
ADE non-linearity property is based upon two well-known principles 
from non-linear systems theory [THOMPSON]: 


2.2.1. The Amplification Principle 


The Amplification Principle states that there are non-linearities 
which occur at large scale which do not occur at small to medium 
scale. 


COROLLARY: In many large networks, even small things can and do cause 
huge events. In system-theoretic terms, in large systems such as 
these, even small perturbations on the input to a process can 
destabilize the system’s output. 


An important example of the Amplification Principle is non-linear 
resonant amplification, which is a powerful process that can 
transform dynamic systems, such as large networks, in surprising ways 
with seemingly small fluctuations. These small fluctuations may 
slowly accumulate, and if they are synchronized with other cycles, 
may produce major changes. Resonant phenomena are examples of non- 
linear behavior where small fluctuations may be amplified and have 
influences far exceeding their initial sizes. The natural world is 
filled with examples of resonant behavior that can produce system- 
wide changes, such as the destruction of the Tacoma Narrows bridge 
(due to the resonant amplification of small gusts of wind). Other 
examples include the gaps in the asteroid belts and rings of Saturn 
which are created by non-linear resonant amplification. Some 
features of human behavior and most pilgrimage systems are influenced 
by resonant phenomena involving the dynamics of the solar system, 
such as solar days, the 27.3 day (sidereal) and 29.5 day (synodic) 
cycles of the moon or the 365.25 day cycle of the sun. 


In the Internet domain, it has been shown that increased inter- 
connectivity results in more complex and often slower BGP routing 
convergence [AHUJA]. A related result is that a small amount of 
inter-connectivity causes the output of a routing mesh to be 
significantly more complex than its input [GRIFFIN]. An important 
method for reducing amplification is ensure that local changes have 
only local effect (this is as opposed to systems in which local 
changes have global effect). Finally, ATM provides an excellent 
example of an amplification effect: if you lose one cell, you destroy 
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the entire packet (and it gets worse, as in the absence of mechanisms 
such as Early Packet Discard [ROMANOV], you will continue to carry 
the already damaged packet). 


Another interesting example of amplification comes from the 
engineering domain, and is described in [CARLSON]. They consider the 
Boeing 777, which is a "fly-by-wire" aircraft, containing as many as 
150,000 subsystems and approximately 1000 CPUs. What they observe is 
that while the 777 is robust to large-scale atmospheric disturbances, 
turbulence boundaries, and variations in cargo loads (to name a few), 
it could be catastrophically disabled my microscopic alterations ina 
very few large CPUs (as the point out, fortunately this is a very 
rare occurrence). This example illustrates the issue "that 
complexity can amplify small perturbations, and the design engineer 
must ensure such perturbations are extremely rare." [CARLSON] 


2.2.2. The Coupling Principle 


The Coupling Principle states that as things get larger, they often 
exhibit increased interdependence between components. 


COROLLARY: The more events that simultaneously occur, the larger the 
likelihood that two or more will interact. This phenomenon has also 
been termed "unforeseen feature interaction" [WILLINGER2002]. 


Much of the non-linearity observed large systems is largely due to 
coupling. This coupling has both horizontal and vertical 
components. In the context of networking, horizontal coupling is 
exhibited between the same protocol layer, while vertical coupling 
occurs between layers. 


Coupling is exhibited by a wide variety of natural systems, including 
plasma macro-instabilities (hydro-magnetic, e.g., kink, fire-hose, 
mirror, ballooning, tearing, trapped-particle effects) [NAVE], as 
well as various kinds of electrochemical systems (consider the custom 
fluorescent nucleotide synthesis/nucleic acid labeling problem 
[WARD]). Coupling of clock physical periodicity has also been 
observed [JACOBSON], as well as coupling of various types of 
biological cycles. 


Several canonical examples also exist in well known network systems. 
Examples include the synchronization of various control loops, such 
as routing update synchronization and TCP Slow Start synchronization 
[FLOYD, JACOBSON]. An important result of these observations is that 
coupling is intimately related to synchronization. Injecting 
randomness into these systems is one way to reduce coupling. 


Bush, et. al. Informational [Page 5] 


RFC 3439 Internet Architectural Guidelines December 2002 


Interestingly, in analyzing risk factors for the Public Switched 
Telephone Network (PSTN), Charles Perrow decomposes the complexity 
problem along two related axes, which he terms "interactions" and 
"coupling" [PERROW]. Perrow cites interactions and coupling as 
significant factors in determining the reliability of a complex 
system (and in particular, the PSTN). In this model, interactions 
refer to the dependencies between components (linear or non-linear), 
while coupling refers to the flexibility in a system. Systems with 
simple, linear interactions have components that affect only other 
components that are functionally downstream. Complex system 
components interact with many other components in different and 
possibly distant parts of the system. Loosely coupled systems are 
said to have more flexibility in time constraints, sequencing, and 
environmental assumptions than do tightly coupled systems. In 
addition, systems with complex interactions and tight coupling are 
likely to have unforeseen failure states (of course, complex 
interactions permit more complications to develop and make the system 
hard to understand and predict); this behavior is also described in 
[WILLINGER2002]. Tight coupling also means that the system has less 
flexibility in recovering from failure states. 


The PSTN’s SS7 control network provides an interesting example of 
what can go wrong with a tightly coupled complex system. Outages 
such as the well publicized 1991 outage of AT&T’s SS7 demonstrates 
the phenomenon: the outage was caused by software bugs in the 
switches’ crash recovery code. In this case, one switch crashed due 
to a hardware glitch. When this switch came back up, it (plus a 
reasonably probable timing event) caused its neighbors to crash When 
the neighboring switches came back up, they caused their neighbors to 
crash, and so on [NEUMANN] (the root cause turned out to be a 
misplaced '’break’ statement; this is an excellent example of cross- 
layer coupling). This phenomenon is similar to the phase-locking of 
weakly coupled oscillators, in which random variations in sequence 
times plays an important role in system stability [THOMPSON]. 


2.3. Complexity lesson from voice 


In the 1970s and 1980s, the voice carriers competed by adding 
features which drove substantial increases in the complexity of the 
PSTN, especially in the Class 5 switching infrastructure. This 
complexity was typically software-based, not hardware driven, and 
therefore had cost curves worse than Moore’s Law. In summary, poor 
margins on voice products today are due to OPEX and CAPEX costs not 
dropping as we might expect from simple hardware-bound 
implementations. 
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2.4. Upgrade cost of complexity 


Consider the cost of providing new features in a complex network. 

The traditional voice network has little intelligence in its edge 
devices (phone instruments), and a very smart core. The Internet has 
smart edges, computers with operating systems, applications, etc., 
and a simple core, which consists of a control plane and packet 
forwarding engines. Adding an new Internet service is just a matter 
of distributing an application to the a few consenting desktops who 
wish to use it. Compare this to adding a service to voice, where one 
has to upgrade the entire core. 


3. Layering Considered Harmful 


There are several generic properties of layering, or vertical 
integration as applied to networking. In general, a layer as defined 
in our context implements one or more of 


Error Control: The layer makes the "channel" more reliable 
(e.g., reliable transport layer) 


Flow Control: The layer avoids flooding slower peer (e.g., 
ATM flow control) 


Fragmentation: Dividing large data chunks into smaller 
pieces, and subsequent reassembly (e.g., TCP 
MSS fragmentation/reassembly) 


Multiplexing: Allow several higher level sessions share 
single lower level "connection" (e.g., ATM PVC) 


Connection Setup: Handshaking with peer (e.g., TCP three-way 
handshake, ATM ILMT) 


Addressing/Naming: Locating, managing identifiers associated 
with entities (e.g., GOSSIP 2 NSAP Structure 
[RFC1629]) 


Layering of this type does have various conceptual and structuring 
advantages. However, in the data networking context structured 
layering implies that the functions of each layer are carried out 
completely before the protocol data unit is passed to the next layer. 
This means that the optimization of each layer has to be done 
separately. Such ordering constraints are in conflict with efficient 
implementation of data manipulation functions. One could accuse the 
layered model (e.g., TCP/IP and ISO OSI) of causing this conflict. 

In fact, the operations of multiplexing and segmentation both hide 
vital information that lower layers may need to optimize their 
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performance. For example, layer N may duplicate lower level 
functionality, e.g., error recovery hop-hop versus end-to-end error 
recovery. In addition, different layers may need the same 
information (e.g., time stamp): layer N may need layer N-2 
information (e.g., lower layer packet sizes), and the like [WAKEMAN]. 
A related and even more ironic statement comes from Tennenhouse’s 
classic paper, "Layered Multiplexing Considered Harmful" 
[TENNENHOUSE]: "The ATM approach to broadband networking is presently 
being pursued within the CCITT (and elsewhere) as the unifying 
mechanism for the support of service integration, rate adaptation, 
and jitter control within the lower layers of the network 
architecture. This position paper is specifically concerned with the 
jitter arising from the design of the "middle" and "upper" layers 
that operate within the end systems and relays of multi-service 
networks (MSNs)." 


As a result of inter-layer dependencies, increased layering can 
quickly lead to violation of the Simplicity Principle. Industry 
experience has taught us that increased layering frequently increases 
complexity and hence leads to increases in OPEX, as is predicted by 
the Simplicity Principle. A corollary is stated in RFC 1925 
[RFC1925], section 2(5): 


"It is always possible to agglutinate multiple separate problems 
into a single complex interdependent solution. In most cases 
this is a bad idea." 


The first order conclusion then, is that horizontal (as opposed to 
vertical) separation may be more cost-effective and reliable in the 
long term. 


3.1. Optimization Considered Harmful 


A corollary of the layering arguments above is that optimization can 
also be considered harmful. In particular, optimization introduces 
complexity, and as well as introducing tighter coupling between 
components and layers. 


An important and related effect of optimization is described by the 
Law of Diminishing Returns, which states that if one factor of 
production is increased while the others remain constant, the overall 
returns will relatively decrease after a certain point [SPILLMAN]. 
The implication here is that trying to squeeze out efficiency past 
that point only adds complexity, and hence leads to less reliable 
systems. 
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3.2. Feature Richness Considered Harmful 
While adding any new feature may be considered a gain (and in fact 
frequently differentiates vendors of various types of equipment), but 
there is a danger. The danger is in increased system complexity. 


3.3. Evolution of Transport Efficiency for IP 


The evolution of transport infrastructures for IP offers a good 
example of how decreasing vertical integration has lead to various 


IP over WDM 


efficiencies. In particular, 
| IP over ATM over SONET --> 
| IP over SONET over WDM --> 


\|/ 
Decreasing complexity, CAPEX, OPEX 


The key point here is that layers are removed resulting in CAPEX and 
OPEX efficiencies. 


3.4. Convergence Layering 


Convergence is related to the layering concepts described above in 


that convergence is achieved via a "convergence layer". The end 
state of the convergence argument is the concept of Everything Over 
Some Layer (EOSL). Conduit, DWDM, fiber, ATM, MPLS, and even IP have 
all been proposed as convergence layers. It is important to note 
that since layering typically drives OPEX up, we expect convergence 
will as well. This observation is again consistent with industry 
experience. 


There are many notable examples of convergence layer failure. 
Perhaps the most germane example is IP over ATM. The immediate and 
most obvious consequence of ATM layering is the so-called cell tax: 
First, note that the complete answer on ATM efficiency is that it 
depends upon packet size distributions. Let’s assume that typical 
Internet type traffic patterns, which tend to have high percentages 
of packets at 40, 44, and 552 bytes. Recent data [CAIDA] shows that 
about 95% of WAN bytes and 85% of packets are TCP. Much of this 
traffic is composed of 40/44 byte packets. 


Now, consider the case of a a DS3 backbone with PLCP turned on. Then 
the maximum cell rate is 96,000 cells/sec. If you multiply this 
value by the number of bits in the payload, you get: 96000 cells/sec 
* 48 bytes/cell * 8 = 36.864 Mbps. This, however, is unrealistic 
since it 
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assumes perfect payload packing. There are two other things that 
contribute to the ATM overhead (cell tax): The wasted padding and the 
8 byte SNAP header. 


It is the SNAP header which causes most of the problems (and you 
can’t do anything about this), forcing most small packets to consume 
two cells, with the second cell to be mostly empty padding (this 
interacts really poorly with the data quoted above, e.g., that most 
packets are 40-44 byte TCP Ack packets). This causes a loss of about 
another 16% from the 36.8 Mbps ideal throughput. 


So the total throughput ends up being (for a DS3): 


DS3 Line Rate: 44.736 

PLCP Overhead = 4.032 

Per Cell Header: - 3.840 

SNAP Header & Padding: - 5.900 
30.960 Mbps 


Result: With a DS3 line rate of 44.736 Mbps, the total overhead is 
about 31%. 


Another way to look at this is that since a large fraction of WAN 
traffic is comprised of TCP ACKs, one can make a different but 
related calculation. IP over ATM requires: 


IP data (40 bytes in this case) 

8 bytes SNAP 

8 bytes AALS stuff 

5 bytes for each cell 

+ as much more as it takes to fill out the last cell 


On ATM, this becomes two cells - 106 bytes to convey 40 bytes of 
information. The next most common size seems to be one of several 
sizes in the 504-556 byte range - 636 bytes to carry IP, TCP, anda 
512 byte TCP payload - with messages larger than 1000 bytes running 
third. 


One would imagine that 87% payload (556 byte message size) is better 
than 37% payload (TCP Ack size), but it’s not the 95-98% that 
customers are used to, and the predominance of TCP Acks skews the 
average. 
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3.4.1. Note on Transport Protocol Layering 


Protocol layering models are frequently cast as "X over Y" models. 

In these cases, protocol Y carries protocol X’s protocol data units 
(and possibly control data) over Y’s data plane, i.e., Y isa 
"convergence layer". Examples include Frame Relay over ATM, IP over 
ATM, and IP over MPLS. While X over Y layering has met with only 
marginal success [TENNENHOUSE, WAKEMAN], there have been a few notable 
instances where efficiency can be and is gained. In particular, "X 
over Y efficiencies" can be realized when there is a kind of 
"isomorphism" between the X and Y (i.e., there is a small convergence 
layer). In these cases X’s data, and possibly control traffic, are 
"encapsulated" and transported over Y. Examples include Frame Relay 
over ATM, and Frame Relay, AALS ATM and Ethernet over L2TPv3 
[L2TPV3]; the simplifying factors here are that there is no 
requirement that a shared clock be recovered by the communicating end 
points, and that control-plane interworking is minimized. An 
alternative is to interwork the X and Y’s control and data planes; 
control-plane interworking is discussed below. 


3.5. Second Order Effects 


IP over ATM provides an excellent example of unanticipated second 
order effects. In particular, Romanov and Floyd’s classic study on 
TCP good-put [ROMANOV] on ATM showed that large UBR buffers (larger 
than one TCP window size) are required to achieve reasonable 
performance, that packet discard mechanisms (such as Early Packet 
Discard, or EPD) improve the effective usage of the bandwidth and 
that more elaborate service and drop strategies than FIFO+EPD, such 
as per VC queuing and accounting, might be required at the bottleneck 
to ensure both high efficiency and fairness. Though all studies 
clearly indicate that a buffer size not less than one TCP window size 
is required, the amount of extra buffer required naturally depends on 
the packet discard mechanism used and is still an open issue. 


Examples of this kind of problem with layering abound in practical 
networking. Consider, for example, the effect of IP transport’s 
implicit assumptions of lower layers. In particular: 


o Packet loss: TCP assumes that packet losses are indications of 
congestion, but sometimes losses are from corruption on a wireless 
link [RFC3115]. 


o Reordered packets: TCP assumes that significantly reordered 


packets are indications of congestion. This is not always the 
case [FLOYD2001]. 
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o Round-trip times: TCP measures round-trip times, and assumes that 
the lack of an acknowledgment within a period of time based on the 
measured round-trip time is a packet loss, and therefore an 
indication of congestion [KARN]. 


o Congestion control: TCP congestion control implicitly assumes that 
all the packets in a flow are treated the same by the network, but 
this is not always the case [HANDLEY]. 


3.6. Instantiating the EOSL Model with IP 


While IP is being proposed as a transport for almost everything, the 
base assumption, that Everything over IP (EOIP) will result in OPEX 
and CAPEX efficiencies, requires critical examination. In 
particular, while it is the case that many protocols can be 
efficiently transported over an IP network (specifically, those 
protocols that do not need to recover synchronization between the 
communication end points, such as Frame Relay, Ethernet, and AAL5 
ATM), the Simplicity and Layering Principles suggest that EOIP may 
not represent the most efficient convergence strategy for arbitrary 
services. Rather, a more CAPEX and OPEX efficient convergence layer 
might be much lower (again, this behavior is predicted by the 
Simplicity Principle). 


An example of where EOIP would not be the most OPEX and CAPEX 
efficient transport would be in those cases where a service or 
protocol needed SONET-like restoration times (e.g., 50ms). It is not 
hard to imagine that it would cost more to build and operate an IP 
network with this kind of restoration and convergence property (if 
that were even possible) than it would to build the SONET network in 
the first place. 


4. Avoid the Universal Interworking Function 


While there have been many implementations of Universal Interworking 
unction (UIWF), IWF approaches have been problematic at large scale. 
his concern is codified in the Principle of Minimum Intervention 
BRYANT]: 


"To minimise the scope of information, and to improve the efficiency 


of data flow through the Encapsulation Layer, the payload should, 
where possible, be transported as received without modification." 
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4.1. Avoid Control Plane Interworking 


This corollary is best understood in the context of the integrated 
solutions space. In this case, the architecture and design 
frequently achieves the worst of all possible worlds. This is due to 
the fact that such integrated solutions perform poorly at both ends 
of the performance/CAPEX/OPEX spectrum: the protocols with the least 
switching demand may have to bear the cost of the most expensive, 
while the protocols with the most stringent requirements often must 
make concessions to those with different requirements. Add to this 
the various control plane interworking issues and you have a large 
opportunity for failure. In summary, interworking functions should 
be restricted to data plane interworking and encapsulations, and 
these functions should be carried out at the edge of the network. 


As described above, interworking models have been successful in those 
cases where there is a kind of "isomorphism" between the layers being 
interworked. The trade-off here, frequently described as the 
"Integrated vs. Ships In the Night trade-off" has been examined at 
various times and at various protocol layers. In general, there are 
few cases in which such integrated solutions have proven efficient. 
Multi-protocol BGP [RFC2283] is a subtly different but notable 
exception. In this case, the control plane is independent of the 
format of the control data. That is, no control plane data 
conversion is required, in contrast with control plane interworking 
models such as the ATM/IP interworking envisioned by some soft-switch 
manufacturers, and the so-called "PNNI-MPLS SIN" interworking 
[ATMMPLS]. 


5. Packet versus Circuit Switching: Fundamental Differences 


Conventional wisdom holds that packet switching (PS) is inherently 
more efficient than circuit switching (CS), primarily because of the 
efficiencies that can be gained by statistical multiplexing and the 
fact that routing and forwarding decisions are made independently in 


a hop-by-hop fashion [[MOLINERO2002]. Further, it is widely assumed 
that IP is simpler that circuit switching, and hence should be more 
economical to deploy and manage [MCK2002]. However, if one examines 
these and related assumptions, a different picture emerges (see for 
example [ODLYZKO98]). The following sections discuss these 
assumptions. 

5.1. Is PS is inherently more efficient than CS? 


It is well known that packet switches make efficient use of scarce 
bandwidth [BARAN]. This efficiency is based on the statistical 
multiplexing inherent in packet switching. However, we continue to 
be puzzled by what is generally believed to be the low utilization of 


Bush, et. al. Informational [Page 13] 


RFC 3439 Internet Architectural Guidelines December 2002 


Internet backbones. The first question we might ask is what is the 
current average utilization of Internet backbones, and how does that 
relate to the utilization of long distance voice networks? Odlyzko 
and Coffman [ODLYZKO,COFFMAN] report that the average utilization of 
links in the IP networks was in the range between 3% and 20% 
(corporate intranets run in the 3% range, while commercial Internet 


backbones run in the 15-20% range). On the other hand, the average 
utilization of long haul voice lines is about 33%. In addition, for 


2002, the average utilization of optical networks (all services) 
appears to be hovering at about 11%, while the historical average is 
approximately 15% [ML2002]. The question then becomes why we see 
such utilization levels, especially in light of the assumption that 
PS is inherently more efficient than CS. The reasons cited by 
Odlyzko and Coffman include: 


(i). Internet traffic is extremely asymmetric and bursty, but 
links are symmetric and of fixed capacity (i.e., don’t know 
the traffic matrix, or required link capacities); 


(ii). It is difficult to predict traffic growth on a link, so 
operators tend to add bandwidth aggressively; 


(iii). Falling prices for coarser bandwidth granularity make it 
appear more economical to add capacity in large increments. 


Other static factors include protocol overhead, other kinds of 
equipment granularity, restoration capacity, and provisioning lag 
time all contribute to the need to "over-provision" [MC2001]. 


5.2. Is PS simpler than CS? 


The end-to-end principle can be interpreted as stating that the 
complexity of the Internet belongs at the edges. However, today’s 
Internet backbone routers are extremely complex. Further, this 
complexity scales with line rate. Since the relative complexity of 
circuit and packet switching seems to have resisted direct analysis, 
we instead examine several artifacts of packet and circuit switching 
as complexity metrics. Among the metrics we might look at are 
software complexity, macro operation complexity, hardware complexity, 
power consumption, and density. Each of these metrics is considered 
below. 
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5.2.1. Software/Firmware Complexity 


One measure of software/firmware complexity is the number of 
instructions required to program the device. The typical software 
image for an Internet router requires between eight and ten million 
instructions (including firmware), whereas a typical transport switch 
requires on average about three million instructions [MCK2002]. 


This difference in software complexity has tended to make Internet 
routers unreliable, and has notable other second order effects (e.g., 
it may take a long time to reboot such a router). As another point 
of comparison, consider that the AT&T (Lucent) 5ESS class 5 switch, 
which has a huge number of calling features, requires only about 
twice the number of lines of code as an Internet core router [EICK]. 


Finally, since routers are as much or more software than hardware 
devices, another result of the code complexity is that the cost of 
routers benefits less from Moore’s Law than less software-intensive 
devices. This causes a bandwidth/device trade-off that favors 
bandwidth more than less software-intensive devices. 


5.2.2. Macro Operation Complexity 


An Internet router’s line card must perform many complex operations, 
including processing the packet header, longest prefix match, 
generating ICMP error messages, processing IP header options, and 
buffering the packet so that TCP congestion control will be effective 
(this typically requires a buffer of size proportional to the line 
rate times the RTT, so a buffer will hold around 250 ms of packet 
data). This doesn’t include route and packet filtering, or any QoS 
or VPN filtering. 


On the other hand, a transport switch need only to map ingress time- 
slots to egress time-slots and interfaces, and therefore can be 
considerably less complex. 


5.2.3. Hardware Complexity 


One measure of hardware complexity is the number of logic gates ona 
line card [MOLINERO2002]. Consider the case of a high-speed Internet 
router line card: An 0C192 POS router line card contains at least 30 
million gates in ASICs, at least one CPU, 300 Mbytes of packet 
buffers, 2 Mbytes of forwarding table, and 10 Mbytes of other 
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state memory. On the other hand, a comparable transport switch line 
card has 7.5 million logic gates, no CPU, no packet buffer, no 
forwarding table, and an on-chip state memory. Rather, the line-card 
of an electronic transport switch typically contains a SONET framer, 
a chip to map ingress time-slots to egress time-slots, and an 
interface to the switch fabric. 


5.2.4. Power 


Since transport switches have traditionally been built from simpler 
hardware components, they also consume less power [PMC]. 


5.2.5. Density 


The highest capacity transport switches have about four times the 
capacity of an IP router [CISCO,CIENA], and sell for about one-third 
as much per Gigabit/sec. Optical (000) technology pushes this 
complexity difference further (e.g., tunable lasers, MEMs switches. 
e.g., [CALIENT]), and DWDM multiplexers provide technology to build 
extremely high capacity, low power transport switches. 


A related metric is physical footprint. In general, by virtue of 
their higher density, transport switches have a smaller "per-gigabit" 
physical footprint. 


5.2.6. Fixed versus variable costs 


Packet switching would seem to have high variable cost, meaning that 
it costs more to send the n-th piece of information using packet 
switching than it might in a circuit switched network. Much of this 
advantage is due to the relatively static nature of circuit 
switching, e.g., circuit switching can take advantage of of pre- 
scheduled arrival of information to eliminate operations to be 
performed on incoming information. For example, in the circuit 
switched case, there is no need to buffer incoming information, 
perform loop detection, resolve next hops, modify fields in the 
packet header, and the like. Finally, many circuit switched networks 
combine relatively static configuration with out-of-band control 
planes (e.g., SS7), which greatly simplifies data-plane switching. 
The bottom line is that as data rates get large, it becomes more and 
more complex to switch packets, while circuit switching scales more 
or less linearly. 
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532 Te “QOS 


While the components of a complete solution for Internet QoS, 
including call admission control, efficient packet classification, 
and scheduling algorithms, have been the subject of extensive 
research and standardization for more than 10 years, end-to-end 
signaled QoS for the Internet has not become a reality. 
Alternatively, QoS has been part of the circuit switched 
infrastructure almost from its inception. On the other hand, QoS is 
usually deployed to determine queuing disciplines to be used when 
there is insufficient bandwidth to support traffic. But unlike voice 
traffic, packet drop or severe delay may have a much more serious 
effect on TCP traffic due to its congestion-aware feedback loop (in 
particular, TCP backoff/slow start). 


5.2.8. Flexibility 


A somewhat harder to quantify metric is the inherent flexibility of 
the Internet. While the Internet’s flexibility has led to its rapid 
growth, this flexibility comes with a relatively high cost at the 
edge: the need for highly trained support personnel. A standard rule 
of thumb is that in an enterprise setting, a single support person 
suffices to provide telephone service for a group, while you need ten 
computer networking experts to serve the networking requirements of 
the same group [ODLYZKO98A]. This phenomenon is also described in 
[PERROW] . 


5.3. Relative Complexity 


The relative computational complexity of circuit switching as 
compared to packet switching has been difficult to describe in formal 
terms [PARK]. As such, the sections above seek to describe the 
complexity in terms of observable artifacts. With this in mind, it 
is clear that the fundamental driver producing the increased 
complexities outlined above is the hop-by-hop independence (HBHTI) 
inherent in the IP architecture. This is in contrast to the end to 
end architectures such as ATM or Frame Relay. 


[WILLINGER2002] describes this phenomenon in terms of the robustness 
requirement of the original Internet design, and how this requirement 
has the driven complexity of the network. In particular, they 
describe a "complexity/robustness" spiral, in which increases in 
complexity create further and more serious sensitivities, which then 
requires additional robustness (hence the spiral). 
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The important lesson of this section is that the Simplicity 
Principle, while applicable to circuit switching as well as packet 
switching, is crucial in controlling the complexity (and hence OPEX 
and CAPEX properties) of packet networks. This idea is reinforced by 
the observation that while packet switching is a younger, less mature 
discipline than circuit switching, the trend in packet switches is 
toward more complex line cards, while the complexity of circuit 
switches appears to be scaling linearly with line rates and aggregate 
capacity. 


1. HBHI and the OPEX Challenge 


As a result of HBHI, we need to approach IP networks in a 
fundamentally different way than we do circuit based networks. In 
particular, the major OPEX challenge faced by the IP network is that 
debugging of a large-scale IP network still requires a large degree 
of expertise and understanding, again due to the hop-by-hop 
independence inherent in a packet architecture (again, note that this 
hop-by-hop independence is not present in virtual circuit networks 
such as ATM or Frame Relay). For example, you may have to visit a 
large set of your routers only to discover that the problem is 
external to your own network. Further, the debugging tools used to 
diagnose problems are also complex and somewhat primitive. Finally, 
IP has to deal with people having problems with their DNS or their 
mail or news or some new application, whereas this is usually not the 
case for TDM/ATM/etc. In the case of IP, this can be eased by 
improving automation (note that much of what we mention is customer 
facing). In general, there are many variables external to the 
network that effect OPEX. 


Finally, it is important to note that the quantitative relationship 
between CAPEX, OPEX, and a network’s inherent complexity is not well 
understood. In fact, there are no agreed upon and quantitative 
metrics for describing a network’s complexity, so a precise 
relationship between CAPEX, OPEX, and complexity remains elusive. 


The Myth of Over-Provisioning 


As noted in [MC2001] and elsewhere, much of the complexity we observe 
in today’s Internet is directed at increasing bandwidth utilization. 
As a result, the desire of network engineers to keep network 
utilization below 50% has been termed "over-provisioning". However, 
this use of the term over-provisioning is a misnomer. Rather, in 
modern Internet backbones the unused capacity is actually protection 
capacity. In particular, one might view this as "1:1 protection at 
the IP layer". Viewed in this way, we see that an IP network 
provisioned to run at 50% utilization is no more over-provisioned 
than the typical SONET network. However, the important advantages 
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that accrue to an IP network provisioned in this way include close to 


speed of light delay and close to zero packet loss [FRALEIGH]. These 
benefits can been seen as a "side-effect" of 1:1 protection 
provisioning. 


There are also other, system-theoretic reasons for providing 1:1-like 
protection provisioning. Most notable among these reasons is that 
packet-switched networks with in-band control loops can become 
unstable and can experience oscillations and synchronization when 
congested. Complex and non-linear dynamic interaction of traffic 
means that congestion in one part of the network will spread to other 
parts of the network. When routing protocol packets are lost due to 
congestion or route-processor overload, it causes inconsistent 
routing state, and this may result in traffic loops, black holes, and 
lost connectivity. Thus, while statistical multiplexing can in 
theory yield higher network utilization, in practice, to maintain 
consistent performance and a reasonably stable network, the dynamics 
of the Internet backbones favor 1:1 provisioning and its side effects 
to keep the network stable and delay low. 


7. The Myth of Five Nines 


Paul Baran, in his classic paper, "SOME PERSPECTIVES ON NETWORKS-— 
PAST, PRESENT AND FUTURE", stated that "The tradeoff curves between 
cost and system reliability suggest that the most reliable systems 
might be built of relatively unreliable and hence low cost elements, 
if it is system reliability at the lowest overall system cost that is 
at issue" [BARAN77]. 


Today we refer to this phenomenon as "the myth of five nines". 
Specifically, so-called five nines reliability in packet network 
elements is consider a myth for the following reasons: First, since 
80% of unscheduled outages are caused by people or process errors 
[SCOTT], there is only a 20% window in which to optimize. Thus, in 
order to increase component reliability, we add complexity 
(optimization frequently leads to complexity), which is the root 


cause of 80% of the unplanned outages. This effectively narrows the 
20% window (i.e., you increase the likelihood of people and process 
failure). This phenomenon is also characterized as a 


"Complexity/robustness" spiral [WILLINGER2002], in which increases in 
complexity create further and more serious sensitivities, which then 
requires additional robustness, and so on (hence the spiral). 


The conclusion, then is that while a system like the Internet can 
reach five-nines-like reliability, it is undesirable (and likely 
impossible) to try to make any individual component, especially the 
most complex ones, reach that reliability standard. 
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8. Architectural Component Proportionality Law 


As noted in the previous section, the computational complexity of 
packet switched networks such as the Internet has proven difficult to 
describe in formal terms. However, an intuitive, high level 
definition of architectural complexity might be that the complexity 
of an architecture is proportional to its number of components, and 
that the probability of achieving a stable implementation of an 
architecture is inversely proportional to its number of components. 
As described above, components include discrete elements such as 
hardware elements, space and power requirements, as well as software, 
firmware, and the protocols they implement. 


Stated more abstractly: 
Let 
A be a representation of architecture A, 


|a| be number of distinct components in the service 
delivery path of architecture A, 


w be a monotonically increasing function, 


P be the probability of a stable implementation of an 
architecture, and let 


Then 
Complexity(A) = O(w(|A|)) 
P (A) = O(1/w(|A|)) 
where 
O(f) = {g:N->R | there exists c > 0 and n such that g(n) 


< cxf (n) } 


[That is, O(f) comprises the set of functions g for which 
there exists a constant c and a number n, such that g(n) is 
smaller or equal to c*f(n) for all n. That is, O(f) is the 
set of all functions that do not grow faster than f, 
disregarding constant factors] 


Interestingly, the Highly Optimized Tolerance (HOT) model [HOT] 
attempts to characterize complexity in general terms (HOT is one 
recent attempt to develop a general framework for the study of 
complexity, and is a member of a family of abstractions generally 
termed "the new science of complexity" or "complex adaptive 
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systems"). Tolerance, in HOT semantics, means that "robustness in 
complex systems is a constrained and limited quantity that must be 
carefully managed and protected." One focus of the HOT model is to 
characterize heavy-tailed distributions such as Complexity(A) in the 
above example (other examples include forest fires, power outages, 
and Internet traffic distributions). In particular, Complexity (A) 
attempts to map the extreme heterogeneity of the parts of the system 
(Internet), and the effect of their organization into highly 
structured networks, with hierarchies and multiple scales. 


8.1. Service Delivery Paths 


The Architectural Component Proportionality Law (ACPL) states that 
the complexity of an architecture is proportional to its number of 
components. 


COROLLARY: Minimize the number of components in a service delivery 
path, where the service delivery path can be a protocol path, a 
software path, or a physical path. 


This corollary is an important consequence of the ACPL, as the path 
between a customer and the desired service is particularly sensitive 
to the number and complexity of elements in the path. This is due to 
the fact that the complexity "smoothing" that we find at high levels 
of aggregation [ZHANG] is missing as you move closer to the edge, as 
well as having complex interactions with backoffice and CRM systems. 
Examples of architectures that haven’t found a market due to this 
effect include TINA-based CRM systems, CORBA/TINA based service 
architectures. The basic lesson here was that the only possibilities 
for deploying these systems were "Limited scale deployments (such) as 
in Starvision can avoid coping with major unproven scalability 
issues", or "Otherwise need massive investments (like the carrier- 
grade ORB built almost from scratch)" [TINA]. In other words, these 
systems had complex service delivery paths, and were too complex to 
be feasibly deployed. 


9. Conclusions 


This document attempts to codify long-understood Internet 
architectural principles. In particular, the unifying principle 
described here is best expressed by the Simplicity Principle, which 
states complexity must be controlled if one hopes to efficiently 
scale a complex object. The idea that simplicity itself can lead to 
some form of optimality has been a common theme throughout history, 
and has been stated in many other ways and along many dimensions. 
For example, consider the maxim known as Occam’s Razor, which was 
formulated by the medieval English philosopher and Franciscan monk 
William of Ockham (ca. 1285-1349), and states "Pluralitas non est 
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ponenda sine neccesitate" or "plurality should not be posited without 
necessity." (hence Occam’s Razor is sometimes called "the principle 
of unnecessary plurality" and " the principle of simplicity"). A 
perhaps more contemporary formulation of Occam’s Razor states that 
the simplest explanation for a phenomenon is the one preferred by 
nature. Other formulations of the same idea can be found in the 
KISS (Keep It Simple Stupid) principle and the Principle of Least 
Astonishment (the assertion that the most usable system is the one 
that least often leaves users astonished). [WILLINGER2002] provides 
a more theoretical discussion of "robustness through simplicity", and 
in discussing the PSTN, [KUHN87] states that in most systems, "a 
trade-off can be made between simplicity of interactions and 
looseness of coupling". 


When applied to packet switched network architectures, the Simplicity 
Principle has implications that some may consider heresy, e.g., that 
highly converged approaches are likely to be less efficient than 


"less converged" solutions. Otherwise stated, the "optimal" 
convergence layer may be much lower in the protocol stack that is 
conventionally believed. In addition, the analysis above leads to 


several conclusions that are contrary to the conventional wisdom 
surrounding packet networking. Perhaps most significant is the 
belief that packet switching is simpler than circuit switching. This 
belief has lead to conclusions such as "since packet is simpler than 
circuit, it must cost less to operate". This study finds to the 
contrary. In particular, by examining the metrics described above, 
we find that packet switching is more complex than circuit switching. 
Interestingly, this conclusion is borne out by the fact that 
normalized OPEX for data networks is typically significantly greater 
than for voice networks [ML2002]. 


Finally, the important conclusion of this work is that for packet 
networks that are of the scale of today’s Internet or larger, we must 
strive for the simplest possible solutions if we hope to build cost 
effective infrastructures. This idea is eloquently stated in 
[DOYLE2002]: "The evolution of protocols can lead to a 
robustness/complexity/fragility spiral where complexity added for 
robustness also adds new fragilities, which in turn leads to new and 
thus spiraling complexities". This is exactly the phenomenon that 
the Simplicity Principle is designed to avoid. 


10. Security Considerations 


This document does not directly effect the security of any existing 
Internet protocol. However, adherence to the Simplicity Principle 
does have a direct affect on our ability to implement secure systems. 
In particular, a system’s complexity grows, it becomes more 
difficult to model and analyze, and hence it becomes more difficult 
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to find and understand the security implications inherent in its 
architecture, design, and implementation. 
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